12.06.2019

Introduction

Results of our little survey:

Name Field Research
Henrik Bibliometrics Statistics, coding, prose

Topics du jour:

  • The modern research cycle
  • The Open Science movement
  • Incorporating workflow thinking into your research

Part I: The modern research project

An idealised research project

Requirements

  • Data management plan
  • Publication plan
  • Dissemination plan

Data management plan

  • What do you collect?
  • How do you treat it?
  • How will you keep/share it?

Publication plan

  • Where do you plan to publish?
  • What part of the project will make it into which publications?

The publishing cycle

The publishing cycle, really

Dissemination plan

  • How will you present your research?
  • In which channels?

A more realistic project plan

Is this you?

Why all this stuff?

[W]e have two major points to consider. First, due to a lack of adequate incentives in the reward structure of professional science […] actual replication attempts are rarely carried out. Second, to the extent that they are carried out, it can be well-nigh impossible to say conclusively what they mean, whether they are “successful” (i.e., showing similar, or apparently similar, results to the original experiment) or “unsuccessful” (i.e., showing different, or apparently different, results to the original experiment).

Earp, B. and D. Trafimov (2015) Replication, falsification, and the crisis of confidence in social psychology. Frontiers in Psychology

This is the whole abstract of an interesting paper in the field of genomic biology:

The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers. A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.

Ziemann, M., Y. Eren, A. El-Osta (2016) Gene name errors are widespread in the scientific literature. Genome Biology 17:177

Storytime

Here are some rows of some of the columns:

s4 s6 s7 s8 s9
4 4 1 NA 46
3 1 1 NA 125
3 1 1 NA 90
3 3 1 NA 156
4 5 1 NA 78
  • Only problem: I don’t know where I put the codebook!

Part II: How to deal with this?

Just don’t do it

The arrival of Open Science

[accountability, reproducibility, transparency]

Part III: Examples of digital workflows

Collaborating

  • From simple to

Keeping track

Documenting

Sharing

The trade-offs

  • There are powerful, efficient tools at our disposal
  • There is a learning curve of varying steepness
  • Maybe

Resources